Skip to content

RXR 2396: update pct_weight_for_age() and related functions#66

Merged
mccarthy-m-g merged 18 commits intomasterfrom
RXR-2396
Apr 7, 2025
Merged

RXR 2396: update pct_weight_for_age() and related functions#66
mccarthy-m-g merged 18 commits intomasterfrom
RXR-2396

Conversation

@mccarthy-m-g
Copy link
Copy Markdown
Contributor

This PR is a first attempt at updating the pct_weight_for_age() and related functions. We use these functions in production code internally, so I've opted to add new pct_weight_for_age_v() (v for vectorized) functions rather than refactor the old ones, since the new versions would lead to breaking changes. Main changes in this PR:

  • I turned on roxygen's markdown parser, which is why so many files have been changed in this PR.
  • I added the CDC growth chart data sets as data sets in the package. This makes them easy to access, and means we don't need to read the CSV files constantly in function code.
  • I added functions for the CDC LMS equations.
  • I added vectorized version of pct_*_for_*_v() functions. The reason we want vectorized versions of these functions is for data cleaning; you can now use these in a dplyr::mutate() call to get make a new column of percentiles for each observation.

We should discuss whether to keep these as new/separate functions, or to introduce breaking changes to the old ones. If we do want to keep them separate, I can still update the old ones to work with the new CDC data/equations.

Here's some example output for the main functionality:

# Returns a vector of percentiles for the given physical measurement:
pct_weight_for_age_v(weight = 20, age = 5, sex = "female")
[1] 76.53524

# Set `return_vector = FALSE` to return a data frame with additional info:
pct_weight_for_age_v(
  weight = 20, age = 5, sex = "female", return_vector = FALSE
)
     sex weight age         L        M         S    P_obs      P01       P1
1 female     20   5 -1.284118 17.93048 0.1408402 76.53524 12.68933 13.64029
        P3       P5      P10      P15      P25      P50      P75      P85
1 14.27484 14.63913 15.24372 15.68509 16.39324 17.93048 19.84219 21.07625
       P90      P95      P97      P99    P999
1 22.01747 23.60511 24.78557 27.43144 33.9153

# Supply vectors of equal length to return information for each observation.
# This is particularly useful in calls to `dplyr::mutate()` or similar. 
pct_weight_for_age_v(
  weight = c(11, 7.2, 4.6, 4, 4.1),
  age = c(9.5, 6.1, 1.5, 2, 3),
  age_units = "months",
  sex = c("male", "female", "male", "male", "female")
)
[1] 90.543056 47.145711 33.764620  3.243798  1.613529

@roninsightrx
Copy link
Copy Markdown
Contributor

Looks good, playing around wit a bit. In principle changes to UI are fine with me.

One benefit though of the old version is that you can also request the median weight for a
specific age, something that occurs quite often. You do this by leaving out the weight argument:

pct_weight_for_age(age = 8, sex = "male", return_median = TRUE)

However, in the new implementation this doesn't seem possible (you have to supply a dummy value and it cannot be NA either), which is a bit counterintuitive:

pct_weight_for_age_v(age = 8, weight = 30, sex = "male", return_vector = FALSE)$P50

Can we make the weight argument optional for that use case?

Also, it seems the return_vector argument is implemented in a reversed way (return_vector=FALSE gives me the full vector).

@mccarthy-m-g
Copy link
Copy Markdown
Contributor Author

One benefit though of the old version is that you can also request the median weight for a
specific age, something that occurs quite often. You do this by leaving out the weight argument.

Could we instead add median_weight_for_age() (etc.) functions, where the weight argument isn't present? I found the old behaviour a bit confusing and awkward, and this would be clearer UX.

Also, it seems the return_vector argument is implemented in a reversed way (return_vector=FALSE gives me the full vector).

return_vector = FALSE returns a data frame, and return_vector = TRUE returns a numeric vector. It's clearer when you have multiple input values:

Screenshot 2025-03-24 at 3 10 37 PM

@roninsightrx
Copy link
Copy Markdown
Contributor

roninsightrx commented Mar 25, 2025

Could we instead add median_weight_for_age() (etc.) functions, where the weight argument isn't present? I found the old behaviour a bit confusing and awkward, and this would be clearer UX.

Sure, that's fine too, even better. I think it's important to have that feature easily accessible.

return_vector = FALSE returns a data frame, and return_vector = TRUE returns a numeric vector. It's clearer when you have multiple input values:

I see. It's still a bit counterintuitive to me. Because if I only pass one value (a common use case), the output is just a single value while with return_vector=FALSE I actually do get a vector (or at least what looks like a vector in the console). return_numeric or return_df would be more intuitive to me.

@mccarthy-m-g
Copy link
Copy Markdown
Contributor Author

Renamed the return_vector argument to return_numeric to avoid confusion, and added median_*_for_*() functions.

Copy link
Copy Markdown
Contributor

@roninsightrx roninsightrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. Few final comments:

  • we can also remove the old data (in inst), no?
  • one q about not exporting a helper function.

@mccarthy-m-g
Copy link
Copy Markdown
Contributor Author

we can also remove the old data (in inst), no?

Yeah, I don't think we need it anymore. If we do this, do we also want to remove the read_who_table() function? (it doesn't look like we're using this internally anywhere)

Copy link
Copy Markdown
Contributor

@roninsightrx roninsightrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Contributor

@jasmineirx jasmineirx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, very minor documentation suggestions only

@mccarthy-m-g mccarthy-m-g merged commit 104bf95 into master Apr 7, 2025
1 check passed
@mccarthy-m-g mccarthy-m-g deleted the RXR-2396 branch April 7, 2025 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants